47 research outputs found
Replacing 6T SRAMs with 3T1D DRAMs in the L1 data cache to combat process variability
With continued technology scaling, process variations will be especially detrimental to six-transistor static memory structures (6T SRAMs). A memory architecture using three-transistor, one-diode DRAM (3T1D) cells in the L1 data cache tolerates wide process variations with little performance degradation, making it a promising choice for on-chip cache structures for next-generation microprocessors.Peer ReviewedPostprint (published version
RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN
The second-order training methods can converge much faster than first-order
optimizers in DNN training. This is because the second-order training utilizes
the inversion of the second-order information (SOI) matrix to find a more
accurate descent direction and step size. However, the huge SOI matrices bring
significant computational and memory overheads in the traditional architectures
like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM)
technology is suitable for the second-order training because of the following
three reasons: First, PIM's computation happens in memory, which reduces data
movement overheads; Second, ReRAM crossbars can compute SOI's inversion in
time; Third, if architected properly, ReRAM crossbars can
perform matrix inversion and vector-matrix multiplications which are important
to the second-order training algorithms.
Nevertheless, current ReRAM-based PIM techniques still face a key challenge
for accelerating the second-order training. The existing ReRAM-based matrix
inversion circuitry can only support 8-bit accuracy matrix inversion and the
computational precision is not sufficient for the second-order training that
needs at least 16-bit accurate matrix inversion. In this work, we propose a
method to achieve high-precision matrix inversion based on a proven 8-bit
matrix inversion (INV) circuitry and vector-matrix multiplication (VMM)
circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture
for the second-order training. Moreover, we propose a software mapping scheme
for \archname{} to further optimize the performance by fusing VMM and INV
crossbar. Experiment shows that \archname{} can achieve an average of
115.8/11.4 speedup and 41.9/12.8energy saving
compared to a GPU counterpart and PipeLayer on large-scale DNNs.Comment: 13pages, 13 figure
A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality
The Internet of Medical Things (IoMT) is a platform that combines Internet of
Things (IoT) technology with medical applications, enabling the realization of
precision medicine, intelligent healthcare, and telemedicine in the era of
digitalization and intelligence. However, the IoMT faces various challenges,
including sustainable power supply, human adaptability of sensors and the
intelligence of sensors. In this study, we designed a robust and intelligent
IoMT system through the synergistic integration of flexible wearable
triboelectric sensors and deep learning-assisted data analytics. We embedded
four triboelectric sensors into a wristband to detect and analyze limb
movements in patients suffering from Parkinson's Disease (PD). By further
integrating deep learning-assisted data analytics, we actualized an intelligent
healthcare monitoring system for the surveillance and interaction of PD
patients, which includes location/trajectory tracking, heart monitoring and
identity recognition. This innovative approach enabled us to accurately capture
and scrutinize the subtle movements and fine motor of PD patients, thus
providing insightful feedback and comprehensive assessment of the patients
conditions. This monitoring system is cost-effective, easily fabricated, highly
sensitive, and intelligent, consequently underscores the immense potential of
human body sensing technology in a Health 4.0 society
CeO2 Nanowires Inserted into Reduced Graphene Oxide as Active Electrocatalyst for Oxygen Reduction Reaction
Fabrication of an interconnected and conductive nano-architecture is a prospective strategy to design a high-performance and low cost electrocatalyst for oxygen reduction reaction (ORR). Herein, a novel nano-architecture assembled by graphene nanosheets and CeO2 nanowires (NWs) with a hierarchical structure was developed by a facile hydrothermal process using ethanol/water as solvents without any organic additives. In this framework, graphene oxide (GO) was reduced to graphene and chemical bonding formed between the GO and CeO2 NWs in a hydrothermal process. The imbedded CeO2 NWs could prevent the restacking of the graphene sheets and improved the electrical conductivity of the hybrid catalyst. The effect of different ratios of GO to CeO2 NWs in the hybrid were studied. The GO3-CeO2 NWs composite exhibited better catalytic performance with slow attenuation and high limiting current density 3.55 and 1.99 times higher than CeO2 NWs and pure GO. The onset potential of GO3-CeO2 NWs is 0.13 V and 0.05 V positive shift from that of CeO2 NWs and pure GO, respectively, suggesting that the GO3-CeO2 NWs hybrid had an excellent stability and activity for ORR. It was found that CeO2 NWs served not only as an effective catalyst but also as an “oxygen buffer” to relieve oxygen insufficiency for ORR